mlatoz

Linear Regression

Good

Bad


Simple Linear Regression

y = β0 + β1 * x + ε


  Where:
  
    - y is the dependent variable (the variable we want to predict).
  
    - x is the independent variable (the variable used to make predictions).
  
    - β0 is the intercept, representing the value of y when x is zero.
  
    - β1 is the slope of the regression line, indicating how much y changes for each unit change in x.
  
    - ε represents the error term, which accounts for the variability of y that is not explained by the regression line.

Ordinary Least Squares

Y = β0 + β1 * X


  Where:
  
    - Y is the dependent variable (the one we want to predict or explain).
  
    - X is the independent variable (the predictor or explanatory variable).
  
    - β0 is the intercept (the value of Y when X is 0).
  
    - β1 is the slope (the change in Y for a one-unit change in X).
β1 = Σ((Xi - X̄)(Yi - Ȳ)) / Σ((Xi - X̄)2)
β0 = Ȳ - β1 * X̄


  Where:
  
    - Σ represents the sum of.
  
    - Xi is the value of the independent variable for the ith data point.
  
    - Yi is the value of the dependent variable for the ith data point.
  
    - X̄ is the mean of all X values.
  
    - Ȳ is the mean of all Y values.

Python Code Template

Traditional Template

  # Import necessary libraries
  import numpy as np
  import pandas as pd
  import matplotlib.pyplot as plt
  import statsmodels.api as sm

  # Load the data
  data = pd.read_csv("1.01. Simple linear regression.csv")

  # Define the dependent and the independent variables
  y = data['GPA']
  x1 = data['SAT']

  # Explore the data
  plt.scatter(x1, y)
  plt.xlabel('SAT', fontsize=20)
  plt.ylabel('GPA', fontsize=20)
  plt.show()

  # Regression itself
  x = sm.add_constant(x1)
  results = sm.OLS(y, x).fit()
  results.summary()

  # Plotting the graph
  plt.scatter(x1, y)

  yhat = 0.0017 * x1 + 0.275
  fig = plt.plot(x1, yhat, lw = 4, c='orange', label='regression line')
  
  plt.xlabel('SAT', fontsize=20)
  plt.ylabel('GPA', fontsize=20)
  plt.show()

  # Formula Method --> ŷ = b₀ + b₁x₁
  plt.scatter(x1, y)

  yhat = 0.0017 * x1 + 0.275
  fig = plt.plot(x1, yhat, lw = 4, c='orange', label='regression line')

  plt.xlabel('SAT', fontsize=20)
  plt.ylabel('GPA', fontsize=20)
  plt.show()

Modern Template

  # Import necessary libraries
  import numpy as np
  import matplotlib.pyplot as plt
  from sklearn.model_selection import train_test_split
  from sklearn.linear_model import LinearRegression
  from sklearn.metrics import mean_squared_error
  
  # Generate some example data
  np.random.seed(42)
  X = 2 * np.random.rand(100, 1)
  y = 4 + 3 * X + np.random.randn(100, 1)
  
  # Split the data into training and testing sets
  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
  
  # Create a Linear Regression model
  model = LinearRegression()
  
  # Train the model on the training data
  model.fit(X_train, y_train)
  
  # Make predictions on the test data
  y_pred = model.predict(X_test)
  
  # Evaluate the model
  mse = mean_squared_error(y_test, y_pred)
  rmse = np.sqrt(mse)
  print(f"Root Mean Squared Error: {rmse}")
  
  # Plot the training data and the regression line
  plt.scatter(X_train, y_train, label='Training Data')
  plt.scatter(X_test, y_test, label='Test Data')
  plt.plot(X_test, y_pred, color='red', linewidth=3, label='Regression Line')
  plt.xlabel('X-axis label')
  plt.ylabel('Y-axis label')
  plt.title('Simple Linear Regression')
  plt.legend()
  plt.show()

Download Resources


«Previous Next»